Information fusion for spoken document retrieval

نویسنده

  • Kenney Ng
چکیده

In this paper we investigate the fusion of different information sources with the goal of improving performance on spoken document retrieval (SDR) tasks. In particular, we explore the use of multiple transcriptions from different automatic speech recognizers, the combination of different types of subword unit indexing terms, and the combination of word and subword-based units. To perform retrieval, we use a novel probabilistic information retrieval model which retrieves documents based on maximum likelihood ratio scores. Experiments on the 1998 TREC-7 SDR task show that the use of these different information fusion approaches can result in significantly improved retrieval performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust fusion method for multilingual spoken document retrieval systems employing tiered resources

In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languag...

متن کامل

Information fusion for monolingual and cross-language spoken document retrieval

of thesis entitled: Information fusion for monolingual and cross-language spoken document retrieval Submitted by LO Wai-Kit for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in October 2002 Spoken document retrieval (SDR) is an important technique that enables relevant information to be searched from spoken data archives. With the advent of Internet and multimedia te...

متن کامل

Pitt at CLEF05: Data Fusion for Spoken Document Retrieval

This paper describes an experimental investigation of data fusion techniques for spoken document retrieval. The effectiveness of retrievals solely based on the outputs from automatic speech recognition (ASR) is subject to the recognition errors introduced by the ASR process. This is especially true for retrievals on Malach test collection, whose ASR outputs have average word error rate (WER) of...

متن کامل

Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese sylla...

متن کامل

Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News

This paper presents the application of a multi-scale paradigm to Chinese spoken document retrieval (SDR) for improving retrieval performance. Multi-scale refers to the use of both words and subwords for retrieval. Words are basic units in a language that carry lexical meaning and subword units (such as phonemes, syllables or characters) are building components for words. Retrieval using subword...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000